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Abstract. We work out some relations between duality and intertwining in 
the context of discrete Markov chains, fixing up the background of previous 
relations first established for birth and death chains and their Siegmund duals. 
In view of the results, the monotone properties resulting from the Siegmund 
dual of birth and death chains are revisited in some detail, with emphasis on 
the non neutral Moran model. We also introduce an ultrametric type dual 
extending the Siegmund kernel. Finally we discuss the sharp dual, following 
closely the Diaconis-Fill study. 
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1. Introduction 

Our work is devoted to the study of duality and intertwining relations between 
Markov chain kernels. Even if these concepts can be established only as relations 
between matrices, as we define them in the next section, our study is on its proba- 
bilistic consequences. For this purpose we need that the matrices are non negative 
and substochastic to be able to define a dual Markov chain. The fact that the 
intertwining kernel is stochastic allows a rich probabilistic interpretation that has 
been given in [2], [4], [7] and [8]. 



A main problem is the existence of a duality relationship between substochastic 
kernels. Indeed, once this fact is established, then several relations can be deduced 
when the starting chain is irreducible and positive recurrent. This is the statement 
of one of our main result, which is Theorem O The hypotheses of this theorem rely 
on a duality relationship between kernels. 



In the following sections, we find additional examples where these duality relations 
between substochastic kernels can be established: for the well-known Siegmund 
kernel the hypotheses of Theorem[2]are verified for monotone chains, see Corollary 
|6] for a generalized ultrametric potential kernel some conditions for the existence of 
the dual are given in Proposition!!]] for birth and death (BD) chains the properties 
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derived from monotonicity are summarized in Corollary [71 and in Proposition 1101 
we show that the non-neutral Moran model is monotone when its bias mechanism 
is nondecreasing. 

For birth and death chains, we revisit the properties relating non negative spectrum 
and monotonicity (see Proposition [9]) and for the Moran model we identify some 
cases with non negative spectra and also when stronger properties are satisfied. 

The section[5]follows closely the ideas on sharp stationary times and duals developed 
m P]j [1] and [7J. In Proposition[l3l we show a sharpness result alluded to in Remark 
2.39 of [2] and in Theorem 2.1 in [7J. One of its corollaries is Proposition [14l where 
the condition for sharpness is written in terms of the dual function. This applies to 
the intertwining of a monotone chain under the Sicgmund dual, in this case both 
chains can start from the state 0. In the BD case we also study some quantitative 
aspects of the absorption time. 

We point out that even if duality and intertwining can be set for Markov chains 
acting on general state spaces and/or with continuous time, we restrict ourselves to 
the discrete time and space in order to be able to present quickly our main results 
and avoid to introduce additional overburden notations. 

2. Duality and Intertwining 

2.1. Notation. Let / be a countable set. By we denote the set of real func- 
tions, and by Tb{I) and T + (I) we denote respectively its bounded and positive 
elements. Since I is countable the set F(T) is identified with the set of vectors 
M. 1 . Let d be a point that does not belong to /, and denote I 9 := I U {d}. Every 
/ G F (I) is extended canonically to a function f d that satisfies f a {d) — 0. 

If A is any set we denote by 1a or 1(A) its characteristic function. We denote by 1 
the unit function defined on / (or in other sets I and / that we introduce further). 

A non negative matrix P = (P{x, y) : x,y € I) is called a kernel on I. (Sometimes 
we will emphasize the non negativity by saying a non negative kernel.) It obviously 
acts on the set T+(I). A substochastic kernel is such that PI < 1, it is stochastic 
when the equality PI = 1 holds, and strictly substochastic if it is substochastic 
and there exists some x £ I such that Pl(x) < 1. When P is substochastic, it 
obviously acts on !Fb(I)- 

The kernel P is irreducible when for any pair x,y £ I there exists n > such that 
P (n) (z,y) > o. 

A point xq € / is an absorbing point of the kernel P when P(xo,y) = 3y,x f° r au 
ye I. 

When P is a substochastic kernel there exists a uniquely defined (in distribution) 
Markov chain X = (X n : n < T x ) taking values in the countable set /, with 
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lifetime T and with transition kernel P. We have the equality P = Px where Px 
is the kernel acting on the set of functions Tb{I) (or by 

Pxf(x) = E(/(X0 ■ 1(T X > 1)) , x G I. 

P generates the semigroup (P n : n > 1), each matrix P n acting on Tb{I) or 
and it verifies 

P n /(z) = E(/(X„) • l(T* > n)) , x G 7 , n > 1 . 

The lifetime T x is such that 

• If P is stochastic then T x = +oo P x — a.c. for all x G J; 

• If P is strictly substochastic then there exists some x E I such that 
P X (T X < +oo) > 0. When P is irreducible strictly substochastic then 
for all x G J it holds P X {T X < +oo) = 1. 

The kernels will be denoted by P, P, P, they will be defined on the countable sets 
I, I, I respectively. When these kernels are substochastic the associated Markov 
chains will be respectively denoted by X, X, X, and the lifetimes of these chains 
will be respectively T, T, T. 

2.2. Strictly substochastic kernel. If P is strictly substochastic we can add a 
new state d to I, and X is extended to the Markov chain X d = (Xf : t > 0) by 

X°=X t ,t<T; X? = d,t>T, 

so d is an absorbing state of the new chain. The transition kernel P x a of X d is 
stochastic and it is given by 

P x og(x) = E x (g(X?) ■ 1(T*° > 1)) + g(d)¥ x (Tf < 1) , 
for all g £ T b {I 9 ) or g G T+{I d ). Then, 

(1) [g(d) = 0] =► [(P£ a «,) |, = P" (si,) , Vn > 1] . 

Therefore, since the canonical extension of / G .F(Pj to / 9 G T{I d ) satisfies f d (d) = 
0, the right hand side of ((T|) is verified for g = f d . 

We recall that h G Fb{I) (or ft- G J-+(/)) is a P— harmonic function if Ph = h, or 
equivalently if it verifies 

E x (h(X n ) ■ 1(T > n)) = h(x) , Vx G Z,Vn > 1 . 

We have that its extension ft a G ^h(/ a ) (or /i s G F+{I d )) such that /i a (9) = is a 
P^a— harmonic function. 

Let us denote by 

Tf = inf{n > : X n G J} 

the hitting time of J C / of the chain X, where as usual we put +oo = inf 0. When 
J = {a} is a singleton we put instead of T^. Observe that with this notation 
we have 

T x = Tf 1 . 
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To simplify the notation, for the Markov chains X, X, X, the hitting times are 
denoted respectively by Tj = Tf , fj = Tf, Tj = Tf (when J is a subset of I, /, 
J, respectively). 

Let us recall the structure of a non irreducible substochastic kernel P. In this case, 
up to permutation, we can partition 

z=i 

in such a way that (see [9], Section 8.3): 

Piixh is irreducible VI € {1, • • • , £} , 

and 

Vxeli,yelt> : P(x, y)>0 => I <l' . 
If P is stochastic then the last of these submatrices Pi eX i e is stochastic, that is 
Phxh^-h — ^-h an d there could be other stochastic submatrices. If P is strictly 
substochastic then none or some of these submatrices Pi lX in I = 1, - ■ ■ could be 
stochastic. We put 

St(P) = {/; : P h x /; is stochastic, I <E {1, • • • , £}} . 

Then, when P is stochastic St(P) ^ 0, and if P is strictly substochastic then St(P) 
could be empty or not. When St(P) ^ then it could contain a unique class or 
not, and also by a simple permutation we can always assume that it contains Ig, 
(this permutation is not needed when P is stochastic). 

2.3. Definitions. We recall the duality and the intertwining relations. As usual 
M 1 denotes the transposed of matrix M, that is M'(x, y) — M (y, x) for all x,y E I. 

Definition 1. Let P and P be two kernels defined on the countable sets I and I , 
and let H = (H(x,y) : x G I,y S I) be a non negative matrix. Then P is said to 
be a H—dual of P if it is verifies 

(2) HP' = PH . 

We call H a dual function between (P, P). □ 

Note that for a kernel P the if— dual P exists when ([2]) holds and P > 0. 

When |/| = |/| is finite and H is nonsingular we get that 

P' = H^PH, 

and so P' and P are similar matrices and have the same spectrum. 

Duality is a symmetric notion between kernels, because if P is a if— dual of P, then 
P is a if'-dual of P. 

We will assume that the non negative dual matrix H is nontrivial, in the sense 
that no row and no column vanishes completely. On the other hand note that if 
H is a dual function between (P, P) then for all c > 0, cH is also a dual function 
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between these matrices. Then, when it is necessary, we can always multiply all the 
coefficients of H by a strictly positive constant. 

This notion of duality (J2j) coincides with the one between Markov processes that 
can be found in references [T5], [T5] and [I] among others. Indeed, let P and P be 
substochastic and let X and X be Markov chains with kernels P and P respectively. 
Then, if P is a H— dual of P, we have that X is a H— dual of X, which means that 

(3) VareJ, yeT, Vn > : E x (H(X n , y)) = E y {H{x, X n )) , 

where we have extended H to (7U {<9}) x (7U{9}) by putting H(x, d) = H(d, y) = 
H(d, 9) = 0, for all x G 7, y G J. 

Let us now introduce intertwining. 

Definition 2. Let P and P 6e two kernels defined on the countable sets I and I 
and let A = (A(y, x) : y £ I , x e I) be a stochastic matrix. We say that P is a 
A— intertwining of P , if it verifies 

PA = AP . 

A is called a link between (P, P). □ 

When |7| = |7| is finite and A is nonsingular we get 

P = APA" 1 . 

and so P and P are similar and have the same spectrum. 

Let P and P be substochastic and denote by X and X the associated Markov 
chains, if P is a A— intertwining of P we say that X is a A— intertwining of X. 
Obviously the intertwining is not a symmetric relation because A' is not necessarily 
stochastic. But when A is doubly stochastic we have that P is a A— intertwining of 
P implies that P is a A'— intertwining of P. 

The stochastic intertwining between Markov chains has been deeply studied in [5] , 
01, m and 0. 

3. Relations between Duality and Intertwining 

Let us introduce additional notation: 

• By e a we denote a column vector with entries except for its a— th entry 
which is 1; 

• When P is an irreducible positive recurrent stochastic kernel, we denote by 
7r = {it(x) : x G 7) its stationary distribution and we write it as a column 
vector. So ir'P = tt' , where tt' is the row vector transposed of n. 

Now we give a result on intertwining that will be often used. 
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Proposition 1. Let P be an irreducible positive recurrent stochastic kernel and tt 
be its stationary distribution. Assume P is a kernel that is a A— intertwining of 
P, PA = AP. If a is an absorbing state in P then, 

(4) tt' = eLA. 

Proof. Since the chain P is positive recurrent with stationary distribution tt and A 
is stochastic we get lim i E^o(A^")(s,i/) = n{y), in particular 

k — >oc 

1 fe-1 

(5) lim -£(AP™)(a,y)=7r(y). 

On the other hand from the assumption we get P n (a,y) = 5 y ^ and then 

(6) (P n A)(a,y) = A(a,y) Vn>0,yeJ. 

From ([5]) we have P™A = AP" for all n > 1, and so from |(SJ) and © we deduce 
A(a, y) = 7r(y) which is equivalent to (e~A)(y) — n(y). Then ((4]) is shown. □ 



For a vector p € we will denote by D p the diagonal matrix with terms (D p )(x, x) = 
p(a;), x £ I. 

Let P be an irreducible positive recurrent stochastic kernel with stationary distri- 
bution tt. By irreducibility we have tt > 0. Denote by P the transition kernel of 
the reversed chain of X, so P(x,y) = n(x) 1 P(y,x)n(y) or equivalently 

(7) P'=D w PD-\ 

We have that P is in duality with P via H = D^ 1 . Note that P is also irreducible 
and positive recurrent with stationary distribution it and that P' = D v P D~ l , so 

we can exchange the roles of P and P. In the reversible case P = P, the relation 
(O expresses a self duality. 

Let us give one of our main results that can be viewed as the generalization of 
Theorem 5.5 in [4] devoted to birth and death chains. 



Theorem 2. Let P be an irreducible positive recurrent stochastic kernel and let tt 
be its stationary distribution. Assume P is a (non negative) kernel and that it is a 
H—dual of P, HP' = PH. where H is nontrivial. Then 

(i) PH'Dn = H'D^P. 

(ii) The vector if := H'tt is strictly positive and it verifies 

Pip = tp . 

(Hi) P = D~ 1 PD tp is a stochastic kernel and it is a A— intertwining of P , so A is 
a stochastic link A, more precisely 

(8) PA = AP with A := D^H'D^ and they verify PI = 1 = Al . 
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Moreover we have the duality relation 

KP' = PK with K := HD' 1 . 

(iv) Let I and I be finite and P be substochastic. Then: 

(ivl) When P is stochastic and irreducible then (p = cl for some c > 0, and P = P. 
(iv2) If P is strictly substochastic then it is not irreducible. 

(iv3) If P is non irreducible then St(P) ^ and there exist some constants q > 
for Ii S St(P) such that 

(9) (p(x) = V qP B ( lim X n eli)= V Cl P x (f T < f) . 

Zest(p) iiGSt(p) 

(iv4) If P has a unique stochastic class Ii, then, 

(10) <£W =Fx (T f < f) for any y eh, 

and the intertwining Markov chain X is given by the Doob transform 

(11) P*(Xi =!/!,••• ,X k =y k )=V x (X 1 =y 1 ,--',X k = y k \T Tt <f). 

(v) If a is an absorbing state in P then a is an absorbing state in P and ^ n' = e~A 
holds. Moreover the sets of absorbing points in P and P coincide. 

(vi) If \I\ = \I\ is finite and H is nonsingular then: P = H' D n P D~ l H' 1 and 
P = APA -1 . Hence P, P , P, are similar matrices and P, P, P have the same 
spectrum. 

Proof. From HP' = PH, we find 

PH' = H'D^PD- 1 . 

By multiplying to the right by D n we get (i). The part (vi) follows directly in the 
finite nonsingular case. 

Since D^l — ir we get that PH'tt = H'D^Pl. Since P is stochastic we get 
PH'ir = H'D^l = H'ir. Let tp — H'ir. Since it > and at each row of H there 
exists a strictly positive element, then ip > 0. Then (ii) holds. Now define, 



P = D- 1 PD V 



By using (i) we get, 



PD~ l H'D v = D^PH'D* = D-'H'D^P 



Then the relation PA = AP holds in (Hi), moreover 



PI = D- 1 PD ip l = D- 1 Py = D- 1 v=l, 
Al = D- 1 H'D„l = D- 1 H'-K = D- 1 <p = l. 
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Then P and A are Markov kernels. Finally from the equality 

HD- 1 P'D ip = HP' = PH , 

the relation KP' = PK is straightforward. Hence (Hi) is verified. 

Now assume / is finite. If P is an irreducible strictly substochastic kernel then 
necessary its spectral radius is strictly smaller that 1, which contradicts the equality 
Pif = tp, because tp > 0. In the case P is stochastic and irreducible, the equation 
Pip = tp, tp > 0, implies tp — cl for some constant c > 0. So (ivl) and (iv2) follow. 

Now assume that the matrix P is substochastic and non irreducible. Let I — /; 
be the partition in irreducible components Pf x f such that x G y G ly and 
P(x,y) > implies V > I. The last submatrix Pf eX f ( verifies, 

Then Pj ( x f is an irreducible substochastic matrix whose Perron- Frobenius eigen- 
value is 1, so we deduce that Pf tX j t is stochastic and <p\j- = cplj g for some constant 

C( > 0, so St(P) 7^ 0. Then, if (J; € St(P)) are the irreducible stochastic classes the 
same argument implies that f\j = c;ly for some quantity c/ > and this happens 

for all % G 5i(P). 

Let X = (X t : i < T) be the Markov chain with kernel P. It is known that all the 
trajectories that are not killed are attracted by (J that is 

T,est(P) 

P*(limX„e M 7/|f = oo) = l. 

T,est(P) 

On the other hand the equality P<p = expresses that is an harmonic function 
for the chain X. Hence, for all n > it is verified, 

<p(x) = E x (tp(X n ),T > n) 

J2 E x (<p{X n ),f>n,T Ti <T) 
Tiest(P) 

+E x (^(X n ),f> n,f< min{f fi : 7, G St(P)}) . 

Then, by taking n — ^ oo in above expression and since lim ¥ x (mm{Tf : // G 
5t(P) > T > n) = 0, we get the relation ©, 

^(x)= J2 ci¥ x (f ti <f). 
hest(P) 

Let us prove part (ivA). Since there is a unique stochastic class the equality (fTOj) 
follows straightforwardly. Then the transition probabilities of P are given by the 
Doob h— transform 

P(x,y) =F x (f Te <f)- 1 P(x,y)¥ y (f z < f) =¥ X (X 1 = y\f z < f ) , Vx,yG/. 



DUALITY AND INTERTWINING 



9 



The Markov property gives the formula for every cylinder. 

Finally, let us show part (v). Since the chain P is positive recurrent with stationary 
distribution n it suffices to show that a is an absorbing state for P. This follows 
straightforwardly from the equality P = D~ 1 PD V , indeed it implies P(a,y) — 
P(a,y)^^ — 5 y .a- Also this proves the equality of the set of absorbing points for 
both kernels P and P. □ 



Remark 1. We can exchange the roles of P and P in the irreducible and positive 
recurrent case. Thus, in the hypothesis of the Theorem we can take P instead of 
P, so P is H~dual of P , HP' = PH, and in all the statements of the Theorem 
we must change P by P . □ 

Remark 2. A probabilistic explanation of how appears ip := H'it > can be done 
when P is substochastic and H is bounded. In this case the dual relation HP' = PH 
is expressed by the expression iT5)J, 

VzeJ, ye I, Vn>0 : K x (H(X n , y)) = E y (H(x, X n )) . 

Since by hypothesis X is an irreducible and positive recurrent Markov chain then ip 
appears as the following limit on the left hand side, 

lim I J2 MH(X n , y)) = V ir(u)H(u, y) = <p(y) . 

n < k u£l 

□ 

Remark 3. We have 

(12) A(x,y) = -^H(y,x)Tr(y), 

in particular A(x, y) = if and only if H{y, x) = 0. □ 

Remark 4. The formulas in Theorem^ state that A, P and P are invariant when 
H is multiplied by a strictly positive constant. Then, we can fit c > and take 
cH in order to have <p(x) = 1 for all x E Ii, or equivalently ci = I, for some fixed 
stochastic class Ii € St(P). 

Remark 5. When the starting equality between stochastic kernels is the inter- 
twining relation PA = AP , then we have the duality relation HP' = PH with 
H = D^A' and P = P. In this case ip = 1. 

We note the equality / = I of the sets where the kernels P and P are defined 
in Theorem [2j On the other hand we recall that in the finite case the positive 
recurrence property on P follows from irreducibility. 

Proposition 3. Assume H is nonsingular and has a constant column that is strictly 
positive, that is 

3a 6 / : He^ = c 1 for some c > . 

Then: 
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(i) a is an absorbing state for P (so {a} is a stochastic class). 

(ii) Under the hypotheses of Theorem n' = e~A holds and if P is strictly sub- 
stochastic and {a} is the unique stochastic class then ¥ y (Ts < T) = tp(y)/tp(a) and 
the relation Ul\) is satisfied. 



Proof, (i) From He% = c 1 we get, 

eiP = e! s H'D n PD- 1 H- 1 '=(He s )' D^PD- 1 H~ U 
= ctt'PD- 1 !!- 1 ^ (H^cl)' = e'~. 

Then 

P(a,y) = (eLP)(y)=4(y)=S n , y . 
So a is an absorbing state for P. 



(ii) From Theorem [2] (v), a is an absorbing state of P and tt' = e~A. The rest of 
part (ii) follows straightforwardly. □ 

When P does not satisfy positive recurrence let us only consider the following 
special case. 

Proposition 4. Let xq G / be an absorbing point of the kernel P and let P be a 

substochastic kernel that is a H—dual of P: HP' = PH. Then h(y) := H(xo,y), 
y G I. is a non negative P— harmonic function. When H is bounded and P is a 
stochastic recurrent kernel, the xo — row H(xo, •) is constant. 

Proof. It suffices to show that the function h is P— harmonic. Since P(xo, z) = 8 z ,x 
Vz G I, we get (PH)(x 0l y) = H(x 0l y). Therefore, if P verifies the duality equality 
© we get, (HP')(x ,y) = H(x ,y) - h(y). Then 

(Ph)(y) = P(JJ, z)H{x , z)=J2 H ( x ^ Z )P\^ V) = (HP')(x , y) = h(y) , 
zeT zef 

and the result is shown. □ 



4. Classes of Dual matrices 

We consider the finite set case. We assume I = I = I = {0, ■ ■ ■ , N}, so the kernels 
are non negative I x I matrices and when they are substochastic the associated 
Markov chains take values in /. 



We will study some classes of non negative matrices H for which there exist sub- 
stochastic kernels P and P in duality relation (|2|). So, in these cases we would be 
able to apply the results established in Theorem [21 and Proposition [3] 
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4.1. The potential case. Let us see what happens with a quite general class of 
matrices, the finite potential kernels. Let R be a strictly substochastic kernel with 
no stochastic classes (this is the case if R is also irreducible). Then it has a well 
defined finite potential, 

H = (Id-R)- 1 = J2 R " > 0. 

n>0 

So H^ 1 = Id — R. (In particular no column nor row of H vanishes). 

Let P be a substochastic kernel. Define 

P = H~ l PH = (Id - R)P(Id - Ry 1 . 

Proposition 5. Assume that also the transposed matrix R' is substochastic. Then, 
PI > and there exists a stochastic kernel P for which it is verified P > 0. Indeed, 
the constant stochastic kernel P = N x +1 11' fulfills the property. 

Proof. Since R' is substochastic we have (Id — R')l > 0. Then 
l'P = T(Id - R)P(ld - R)- 1 > . 

Now, since (Id - i?)" 1 > and P = (P - RP) (Id - P)~\ we get that once the 
relation RP < P is verified then P' > 0. Since R is substochastic the matrix 
P = -^ipY 11' makes the job. □ 

4.2. Siegmund kernel. A well-known case of a kernel H arising as a potential of 
a strict substochastic kernel R as above, is the Siegmund kernel. Let R(x, y) = 
l(x + 1 = y) so it is a strictly substochastic (because the N— th row vanishes) and 
it has no stochastic classes. Its transposed matrix R'(x, y) — l(x = y + 1) is also 
substochastic. 

By direct computation we get that Hs = (Id — R)^ 1 verifies 

Hs(x, y) = l(x < y) 

so it is the Siegmund kernel. We have Hg l = Id — R, then ifg 1 (a;,y) = l(x = 
y)-l(x + l = y). 

This case has been studied in detail, for instance see [1] Section 5. Let us sum- 
marize some well-known observations. We have (HsP')(x,y) = ^ P(y-i z ) an( i 

(PH s )(x,y) = P(x,z). Then, the equation H S P' = PH S gives 

(13) P(y, x) = J2 P(V, z) - P(V, *) = E( P ^ z ) - P ( x + X ' z )) 

z>x z>x z<y 

In particular P > requires the condition, 

(14) Vy G I : ^ P(x, z) decreases with .t 6 /. 

In this case P is called monotone. 
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Also, from P(N, x) = Yl (P{ x > z ) ~ P( x + 1; z )) we deduce that 

z<N 

P stochastic P(N,x) = 5 x ,n, 
so A is an absorbing state of P. Also from (fT5|) we get that 

(15) P(N-1,N)= P(N,z) = 1- P(N,N). 

z<N-l 

We also observe that 

^P(y,x) = ^P(0,z), 

x<N z<y 

in particular 

(16) PI < 1 so P is substochastic , 
and also £ P(0,x) = P(0,0). Then, 

(17) P(0,0) = 1 => P is stochastic ; 

(18) P(0, 0) < 1 =>■ ^ P(0, a:) < 1 and P looses mass through 0. 

x<N 

This last case occurs for any irreducible stochastic kernel P with N > 1. Indeed, 
in this case P(0, 0) = 1 cannot happen because it contradicts irreducibility. 

Also we get, 

P(0, 0) + P(0, 1) = 1 =>■ P docs not loose mass through {1, • • • , N} , 
When the finite matrix P is irreducible we can apply Theorem [2] and in this case 

(19) p(x) = (H' s ir)(x) = J2 1(» < *Mv) = E <y) = : ^ > 

y£l y<x 

is the cumulative distribution of tt. We have that 7r c is not constant because ir > 0. 
Let us show that 

(20) TV is the unique absorbing state of P . 

Indeed, from (|13[) the unique absorption implies that x < N verifies P(x,y) = 5 y x 
if and only if P( x > z) = 1 and P( x + L z ) = 0. Therefore, also from (fT5)l . 

we get 

2_j • p (j / ' z ) = 1 v v - x and X! p ( y ' z ) = v y > x > 

which contradicts the irreducibility of P. 



We obtain the following result. In it we assume A > 1 to avoid the trivial case 
when A = and P(0, 0) = 1. 
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Corollary 6. Let H be the Siegmund kernel, P be a monotone finite irreducible 
stochastic kernel with stationary distribution n. Let HsP' = PHs with P > 0. 
Then: 

(i) P is a strictly substochastic kernel that looses mass through 0, and parts (iv2) 
and (iv3) of Theorem [D hold. 

(ii) tp = n c and the stochastic intertwining kernel A verifies 

(21) A(x,y) = l(x>y)^-. 

ir c {x) 

and the intertwining matrix P of P is given by 

TT C (v) 

P( X ,y) = P{x,y)-^l X ,yel. 
ir c (x) 

(Hi) N is the unique absorbing state for P and Theorem^ parts (ivA) and (v) are 
verified with Ii = {N} and a — N . In particular n' = e' N A holds. 

(iv) The following relation holds: 

(22) Ae^v = n(N)e N 

Proof. The first three parts are direct consequence of Theorem [21 relations (|16|) . 
(fT8| . (fT9|) . (|20|) . Finally, (22]) is a direct computation from (21]) and tp(N) = 1. □ 

4.3. Duality for finite state space birth and death chains. Recall 1 = 1 = 
I = {(),••• ,N}. Let X = (X n : n > 0) be a discrete birth and death (BD) 
chain with transition Markov kernel P = (P(x,y) : x,y € /). Then P(x,y) = if 
\x — y\ > 1 and 

P(x, x + l)=p x , P(x, x-l) =q x P{x, x) = r x , x e I , 

with 

1x + t x + p x = 1 Va; G / and boundary conditions qo = pn = . 
We always take 

q x ,p x >0,x£ {1,..,N-1}. 
We will assume the irreducible case, which in this case is equivalent to the condition 

(23) po > , q N > . 

(A unique exception will be done in Subsection 14.41 where we will explicitly assume 
that (|2"3"]) is not satisfied.) The stationary distribution ir = (ir(x) : x E I) verifies 
w(y) = ?r(0) I] — >0,i/e{l, ..,N}, where tt(0) fulfills £ 7r(y) = 1. 

z<y qz + 1 yel 



The matrix P is self-adjoint with the inner product given by tt, that is it verifies 
n(x)P(x,y) = n(y)P(y,x) for all x,y. So, P = P where P = D^P'D^ is the 
transition matrix of the time reversed process and P has real eigenvalues. 
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The unique constraint in (|13|) to get that P > is satisfied, is for y = x, that is we 
need that the condition P(x, x) > is verified and it reads 

(24) Vxe {(),••• ,N-1] : p x + q x+1 < 1. 



This is the equivalent of (|14p for BD chains. So, when (|24|) is satisfied we say that 
P is monotone. In this case the Siegmund dual P exists and it is a BD kernel with 

P(x, x - 1) = p x , P(x, x) = l-(p x + q x+ i) , P(x, x + 1) = g x+ i . 

The drift of X at x is /(#) := p x — q x , and the drift of X at x is /(x) = p x — q(x) = 
-f[x + 1) + (px+i - Px), so -/(a; + 1) - r K+ i < f(x) < -f{x) + r x . 

Note that P(0,0) = 1 - (po + 9i), P(0, 1) = qi, then the Markov chain X looses 
mass through the state if and only if po = or equivalently ro < 1. On the 
other hand P(N,N) = 1 — (pjy + Q'jv+i) = 1 (because pn = qN+i = 0), so N is 
an absorbing state. When (I24| holds we say that P is a monotone kernel. From 

this analysis, Corollary [6j and the reversibility relation P = P we can state the 
following result. 

Corollary 7. Let H be the Siegmund kernel and P be a finite irreducible stochastic 
BD chain with monotone kernel P and whose parameters are p x ,q x - Let n be the 
stationary distribution of P. Then, the dual matrix P defined by P' = H7 PHg is 
a strictly substochastic kernel that looses mass through the state 0. Moreover: 

(i) ip — 7r c and parts (iv2) and (iv3) of Theorem^ hold. 

(ii) N is an absorbing state of P and {N} is the unique stochastic class of P , all 
the other states in I are transient, and Theorem^ (w4) is verified with Li = {N}. 

(Hi) Let A be the stochastic kernel given by h21}) . Then, the A— intertwining matrix 
P of P is given by 

~ TT C (x — 1) ~. . ~. 7T C (x + 1) 

P(x,x-1) =p x , P[x,x) = l-(p x +q x+ i) , P(x,x+1) = q x+1 ■ 

ir c (x) ir c (x) 

4.4. Absorbing points for the BD kernels. Let us modify the BD kernel P 
by taking as an absorbing state. That is, instead of the irreducibility conditions 
(|2U| we take po = and no restriction on qjy, it could be or > 0. Assume P is 
monotone, so holds. Then the BD kernel P is stochastic, see ([T7| . In this case 
N is the unique absorbing state for P. 

Let us describe what happens by exploiting the special form of the Siegmund dual. 
By evaluating at y = we get 

(25) F x (X n <0))=¥ (x<X n ), 
and by evaluating ([3]) at x — N we obtain 

(26) F N {X n >y)=V y {X n <N). 



Now there are two cases: 
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(i) If qN > then is the unique absorbing state for P. By (fl~5|) we get that 
P(N — 1, N) = qN > 0, so N is an absorbing state that attracts all the trajectories 
of the chain, F x ( lim X n = N) = 1 for x S I. 



(ii) If qN = 0, then and N are absorbing states for P. By using (fl5|) we get that 
P(iV — 1, AT) = qN = 0, so N, besides being an absorbing state for P is an isolated 
state for P (that is P(y, N) = for all y < N). Therefore it does not attract any 
of the trajectories starting from a state different from N. Hence, the equation (|26|) 
is simply the equality 1 = 1 when y < N. 



Let us summarize which is the picture for (ii): P has and N as absorbing states 
that attract all the trajectories of its associated Markov chain X, P is stochastic, N 
is a P— absorbing isolated state, and P L, r . r , r1 is stochastic and irreducible. 
Let 7?* = (7?*(,z) : z E I \ {N}) be the stationary distribution of the submatrix 

P \l\{N}xI\{N}- 

Let 4>(x) = P x I lim X n = ) be the absorption probability at of the chain X 
starting from x. We have the following result. 

Proposition 8. If po — and qN — then and N are absorbing states for 
P and P\j\sn}xI\{N} * s s t oc h as ti c an( ^ ^ as as an absorbing point. Let (f>(x) = 
P x ( lim X n — 0), and 5r* = (w*(z) : Z G I \ {N}) be the stationary distribution of 

\n — >oo / 

^IzXWxAW Then 

^) = l-_M = l-^(x + l) 
r)(N) 

where is the cumulative distribution of 7r* and r/(x) :— J2y=o Y[ V z =i * s ^ e 
scale function of P. 

Proof. The first equality follows from the fact that r\ is a martingale and 77(0) = 0. 
For the second relation we take x < N and let n — > 00 in the formula (|25| . which 
gives 

(f>(x) = 22 7T* («) = 1 - 7?* (x + 1) . 

□ 



4.5. The spectral characterization. Let us give a sufficient spectral property 
for the monotonicity of the kernel P for an irreducible BD chain taking values 
on I = {0, • • • ,N}. Consider the polynomials (<\ y (i) : y £ I) with t E [—1,1], 
determined by: C)o(t) — 1 f° r a U t and the recurrence: 

tqo(t) = Poqi(t) + r q (t) , 

tq y (t) = Py({y+i(t) + r y q y (t) + q y q y -i(t) , y£ {l,--- ,N-1}. 
It holds q y (l) — 1 for all y > and the polynomial q y (t) is of degree y in t. 
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Let Z := {tk : -R/v+i(ifc) — 0} be the zeros of the polynomial B,N+i(t) = fqjv (t) — 
fNC\N(t) — qN({N-i{t), which is of degree A+l. The set Z constitutes the spectrum 
of P (see [12], p. 78). All the zeros are simple and we order them by 1 = to > t\ > 
.. > tjy > — 1. The quantity 1 — tj is the spectral gap. The spectral probability 
measure on [—1,1] is n(dt) := X^feLo ^k^t k , with respect to which (q y (i) : y > 1) 
are orthogonal. It is known that fj, = no- 

Let N = 2No be even. Assume that the BD chain is given by r x = for all x E I, 
and that it is reflected at the boundaries {0, A}, so po — = 1- In this case 
the spectral measure is symmetric on [—1,1], in particular t]^ — and <2_/v — — 1- 
When N = 2No + 1 is odd, the spectral measure is again symmetric, but {0} is no 
longer an eigenvalue and tN > 0. 

A spectral sufficient condition for the monotone property l|24p is given below in 
part (i). This result can be found in Lemma 2.4 of [7], and here we give a different 
proof. On the other hand note that when r x > 1/2 M x E I then obviously the 
monotone condition (f24|) is satisfied. In part (ii) we reinforce this implication. 

Proposition 9. (i) If a BD chain is spectrally non negative, then it is monotone, 
(ii) If r x > 1/2 for all x € I, then the BD chain is spectrally positive. 

Proof. Let us show (i). For a BD chain X whose transition matrix P is spectrally 
non negative, there exists a BD chain Y taking values on {0, ..,2N} reflected at 

the boundary, started at an even integer and such that X ~ (l2 n /2 : n > 0). This 
follows simply from adapting [20], Th. 2.1 to the finite case. As noted just before, 
the spectral measure of Y is symmetric on [—1, 1] and by passing to X the spectrum 
is being folded: If X)fe=o ^k^t k is the symmetric spectral measure of Y with t^ = 
then 2 ^2 k=0 Mfe^t 2 is the spectral measure of A. Let a y and [3 y be the up and down 
probabilities that Y m — > Y m+ \ = Y m ± 1 g iv6n t licit jCfYi is in state y different from 
the endpoints. We have a y + [3 y — 1, and then: 

1x = P2xP2x-l> r x — P2x a 2x~l + C*2xfi-2x + \i Px = a 2x a 2x + l- 

This, together with po — ct\ and gjv = @2N-i a U° ws to determine recursively the 
transition matrix of Y from the one of X. From these facts we deduce that our 
hypothesis implies 

Px + q x + l = a 2x a2x + l + P2x+2^2x + \ < a 2x(*2x+l + 02x+l < 1 > 

then the chain X is monotone. 

_JL _ 1 

For the proof of (ii) first note that P = 2 QD^ 2 , where Q is a symmetric matrix 
given by Q(x, y) = when \x — y\ > 1 and 

Q(x, x + 1) = ^/p x q x+ i = Q(x + l,x), Q(x, x) = r x , x e I . 

Now consider the superdiagonal matrix S such that S(x,y) = if y {x,x + 1} 
and 

S(x,x) = x e I; S(x,x+ 1) = y/q x+ i , x e I , x ^ N . 

Then S'S is a tridiagonal symmetric matrix, with S'S(x, y) = if \x — y\ > 1 and 
S'S(x, x) = p x + q x , S'S(x, x + 1) = y/p x q x+ i = S'S(x + 1, x) , x € 7 . 
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Consider the diagonal matrix D r with r = (ro, • • • , rjv)- We have Q — 2D r —I+S'S, 
so Q is the sum of a diagonal matrix and a symmetric positive definite matrix. We 
conclude that, if the holding probabilities r x > 1/2, for all x E I, then for all 
z e R N+1 \ {0}, 

N 

z'Qz = (2r a - 1) \z x \ 2 + \Sz\ 2 > 

and so Q and P are positive definite. □ 

We emphasize that in part (ii) we show that r x > 1/2 V.t 6 / implies that the 
spectrum is positive, and that this is a stronger property than monotonicity in 
view of (i). On the other hand the condition r x > 1/2 for all x € I is sufficient to 
get a positive spectrum but, as it is easy to see, it is not necessary. 

Example: An example showing that non negative spectrum is not necessary for the 
monotone property (|24[) is the BD chain given p x — p, q x = q, x — 1, .., N — 1 
and boundary conditions ro = q, po = p, qjv = q, rzv = P> where p 6 (0, 1) 
and q = 1 — p. Then the monotone property holds but the spectrum fail to be non 
negative. Indeed, from [6], p. 438 it follows that tf. — 2^/pq cos(-^j), k = 1, ., N—l, 
t = l,t N = -in 

4.6. The Moran model. Let us introduce the 2-allele Moran model with bias 
mechanism p. Let 

p : [0, 1] — > [0, 1] be continuous with < p(0) and p(l) < 1 . 

Denote q(u) := 1 — p(u). The Moran model is a BD Markov chain X characterized 
by the quadratic transition probabilities p x , r x , q x , x G / = {0, .., N}, 




Assuming po = p (0) > and gjv = 1 — P (1) > 0, with y € {1, .., A^}, the BD chain 
is irreducible with invariant distribution 

-(o) M flb WH M^)' 

where 7r(0) is the normalizing constant. 

If A is a Moran model defined by some bias p, then A„ := N — X n is also a Moran 
model with bias p(u) = 1 — p(l — tt), and so with parameters 

= PiV-x = ^q (^) , ? x - q N - x = (l - ^) F(^) 

where g (u) := 1 — The spectra of P and P are the same. 

Proposition 10. Assume that in the Moran model the bias p is nondecreasing. 
Then the BD chain is monotone, that is condition \2J$ p x + q x +\ < 1 is fulfilled 
(and so the Siegmund dual exists). 
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Proof. First, since pn = qn+i = we have nothing to verify for x = N . Let us see 
what happens with x — 0. We need to guarantee 1 — po — qi > 0, but this is true 
because p(l/N) > p(Q) > Np(0) — (N — 1). 

Let us consider the case x € {1, - ■ ■ ,N — 1}. We have the following relations, where 
in the first inequality we use that p is nondecreasing, 



Px + q x +i 




(27) = -(p(^)«N-l-2x) + (x + l))) <1. 

Now, the last inequality < 1 in (|27|) is fulfilled because: 

If x = it reduces to ^±1 < 1; 

If x < it reduces to p < jfffrzj- > an d this is satisfied because the right 
hand side of this expression is > 1; 

If ^f 1 < x < N - 1 it is verified because 7V-l-x>OandiV-I-2a;<0. □ 



Moran model with mutations. A basic bias example is the mutation mechanism 

(28) p{u) = (l-a 2 )u + ai(l-u), 

where (01,02) are mutation probabilities in (0,1]. The drift is p (u) — u. When 
a-i + 0-2 7^ 1, the invariant probability measure satisfies -k{x) — (^) > 
where {a) x := T (a + x) /F (a). 

When p is non-decreasing, we have a± + 0,2 < 1. In p(u) the roles of ai and ai are 
exchanged. 

The case ai = 02 = 1, that is piu) = 1 — u, corresponds to the heat-exchange 
Bernoulli-Laplace model [BJ. Here, tt{x) = ( N x ) ( n N _ x )/( 2 n)- If a i = °2 = V 2 
then p(u) = 1/2 which is amenable (through a suitable time substitution) to the 
Ehrenfest urn model provided N is even. 

One-way mutations, (01,02) = (oi,Q) or (0,02) lead to the choice p(l) = 1 or 
p(0) = respectively, corresponding to the case in which N or is an absorbing 
state respectively. 

Except for some exceptional special cases, the spectral measure associated to the 
Moran model is not known. Let us supply some of these special cases. 

Spectral representation of the Moran model with mutations. Assume 
ai + a-2 7^ 1, [H]. Here the eigenvalues are 

k ( k — 1 

(29) i fe = l-^ fai+o 2 + ^^(l-(ai-l-a2)) 



DUALITY AND INTERTWINING 



19 



which is non negative for all k G /. The spectral gap is 1 — 1\ = (ai + (12). 

When a\ = a-z = 1, tk = 1 — jjz (2JV + 1 — k) and the spectral measure is given by 
Mfc = 2 2wVi- 2 fc fc (^)/( 27 V fc )- The ex P ected return time to is 2 2N /V^N whereas 
the expected return time to N/2 is of order \AriV /2, much smaller. 

When ai + 02 = 1, p{u) = a\ is constant and the transition probabilities become 
affine linear functions of the state. Here ir{x) — (^jajaf -1 , fj, k = (^)a^a^ v_fc and 
tk = 1 — jj. When a\ = 1/2, the holding probabilities are r x = 1/2 and both n(x) 
and \i k are symmetric Binomial(iV, 1/2) distributed. 

Cases with positive eigenvalues. We may look for conditions on the mechanism 
p leading to r x > | in which case the BD chain is spectrally positive. Assume 
p : [0, 1] — > (0, 1) is continuous, non-decreasing and so < p(0) < p(l) < 1. 

Then, as can easily be checked when TV is even: 

r x > 1/2 Vx S I p(l/2) = 1/2 with p(0) < - < p(l). 

Indeed, imposing r x > 1/2 for all x leads to p(u) > 1/2 if u > 1/2, p(u) < 1/2 if 
u < l/2andp(l/2) = 1/2. Since p is non-decreasing these conditions are equivalent 
to p(l/2) = 1/2. The reciprocal also holds. When N is odd an analogous condition 
can be written. When the mutation mechanism satisfies 0<ai<l~ 02 < 1, 
the condition p(l/2) = 1/2, leads to a\ = 0,2- However, it is easy to see that the 
condition a\ — ai is not necessary for P to be spectrally positive. 

4.7. Generalized ultrametric case. Let us examine another triangular matrix 
H that is also a potential matrix. It belongs to the class of generalized ultrametric 
matrices (see [19], [16]), a class that contains the ultrametric matrices introduced 
in HZ]. 

Let C be a nonempty set strictly contained in I = {0, N}. Denote C — I \ C. 
We put C(x) = C when x E C and C(x) = C otherwise. Take a, f3 > 0, and put 
j(x) = a if x S C and j(x) = (5 otherwise. Now, define the matrix H a ^ by 

H at p{x, y) - l(x <y)+ 7(1)1(1 < y)l(C(x) = C(y)) , 

which is a clear generalization of the Siegmund dual because Hq q = Hg. It is 
straightforward to check that H a> p belongs to the class of potential matrices intro- 
duced in Subsection 14.11 indeed H a ^ = (Id — R)^ 1 with 

R(x, y) = l(x =y)- ^-^\{ x = y) + ^—^ l{x + 1 = 2/). 

As it is easily checked R is an irreducible strictly substochastic matrix that looses 
mass through the state N . Then 

tfa./T 1 = ld-R, so H a ^~ 1 (x,y) = 1 l(x = y) - l(x + l = y) . 

l + 7(x) l+7(x) 

In this case we are able to compute the inverse matrix H a ^~ , the description of 
the inverse of any generalized ultrametric matrices can be found in [5] . We point out 
that R' is substochastic only when a > f3 and in this case it is an irreducible strictly 
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substochastic that looses mass through the state 0. In the rest of this Subsection, 
we will put H = H a .j3 to avoid overburden notation. 

We have, 

(HP')(x,y) = ^H(x,z)P(y,z) = ^P(y,z)+ 7 ( a ;) ]T P(y,z), 

z>x z>x z>x,zeC(x) 

(PH)(x,y) = ^P(z,z)tf(z, 2 ,) = ]TP( a ;,z)+ 7 (; y ) ]T P(x,z). 

z<V z<y z<y, zeC(y) 

By permuting 7" we can always assume that C is an interval, that is C = {1, k} 
for some < k < N, and so C = {k + 1, ...,N}. With this choice we have that 
each x £ {k, N} verifies C(x) = C(x + 1) and so 7(2;) = j(x + 1). 

(i). Let x ^ k. From the above equalities we find 

(HP')(x, y) - (HP')(x + 1, y) = (1 + -y(x))P(y, x) . 

(the case x = N follows from (HP')(N + 1, y) = 0). Then the equality HP' = PH 
implies, 

(l +1 (x))P(y,x) = J2(P(x,z)-P(x+l,z))+ 7 (y) ]T (P(x,z)-P(x+l,z)) . 

z<y z<y,zeC(y) 

(11) . Let x ^ k and y < k. In this case we have 7(2/) = a and z < y implies 
z G C(y). So, we find 

(1 + j(x))P(y, x) = (l + a) J2( p ( x > z ) - p ( x + !. *)) ■ 

z<y 

Then, a necessary and sufficient condition for P(y, x) > is that 

(30) Y,P(x + l,z)<Y,P(x,z), 

z<y z<y 

and we get 

(31) P{y, x) = (j^y) z ) - P ^ x + *)) ■ 

(12) . Let x ^ k and y > k. In this case we have that 7(2/) = /?, and C(z) = C(y) if 
and only if z > k. Then, 

{l+-y(x))P(y,x) = ^2(P(x,z)-P(x + l,z))+0 E (P(x,z) - P(x + 1, z)) , 

z<y k<z<y 

and so 
(32) 

= (i^)E( p (».*)- p (*+ 1 '*))+(ni^) E^.*)-^ 1 '*))- 

z^.k k < Cz < ^~.'if 

Then, a necessary and sufficient condition in order that P(y, x) > for x ^ k is 
that 

(33) E^+M+a+z?) E ^+i^)<E p ( x ' z )+( 1 +/ 3 ) E p ( x ' z )- 

2</c k<z<y z<k k<z<y 
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We can summarize subcases (il) and (z2) as for all x ^ k 

P(v, x) = (^Pr\) ^-p^+i, > fc)— ^- E(^(*> z yp(x+i, z)). 

The necessary and sufficient condition in order that P(y, x) > for x ^ k is 
constituted by (30]) and (j33| . 



(n). Assume x = k. Recall that 7(fc) = a and ^(k + 1) = /3, so 

(HP')(k, y ) = y,Hv,z)+* e p(y,z) = J2 p (y> z ^ + ( 1 + a ^y> k y 

z>k z>k,z£C{k) z>k 

(HP')(k + l,y) = ]TP(y,z)+/? E P{v^) = (l + (i)Y J Hv,z). 

z>k z>k,z£C(x) z>k 

Then, by using HP' = PH, 

(l + a)P(y,k) = (HP')(k iy )~(HP')(k + l,y)+p^2 P (y^) 

z>k 

= (P H )(k,y)- (-^j {PH)(k+l,y) , 

and so 

(l + a)P(y,k) = ^(p( k ,z)-L^\P{k + l,z)\ 

z<y 

(34) + 7 (y) E ( p ( fc ' z )-(lZ5) p ( fc+1 ' 

From we deduce, 

(35) y<k: P(y, k) = E ( P(k, z) - \-^\ P(k + 1, z) 

z<y 

and 



rb)s(^-)-(rb) p < 4 + ^> 



z<fc 

(36) H^) fe jJ P(fc ' Z) "fe' P(fc + M) 



y>fc: P(y,fc) = 



Hence, the equations (|30|) and (|33|) imply that P(y, fc) > for all y, and then they 
are necessary and sufficient for P > 0. 

From (|3"Tj) we find, 

Vy<fc: £p( V ,aO=£(P(0,*)-P(M)), ^%i)= (^f)E P ( fc+1 ' z ) 

K<fe z<y x>k ^ ' 2<y 

So, by using (f35|) we get 

(37) Vy<k: E ^ x ) = E P ( ' *) + (tTb) £ P(fc + z) " 
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On the other hand, from (|32|) we obtain, 

Vy>fc: £%,*) = (^^P(Q,z)-P(k,z))+(^^^(P(0,z)-P(k,z)), 

x<k » ' z<k ^ ' k<z<y 

E p (y- T ) = (Vb)E p ( fc+1 > z )+ E p ( fc +M). 



:c>fc js</c k<z<y 



By using (|36|) we get 



i<JV x 7 z<fc x ' k<z<y 

(38) 



— £ — -)yp(HM)+f— ) y p(fc + i,z) 

/v z</c x 7 k<z<y 



Proposition 11. Let P be a stochastic kernel and let P be a H a p—dual of P. 
Then, a sufficient condition to have P > is the following one: 



(39) 


36 e (0,1 


(40) 


\fy < k 


(41) 


\fy > k 



z<k 



z<y 



k<z<y 



Moreover, under the conditions i39\), and \41\ ), P is substochastic if and only 



In this case P is conservative at sites k and N. 



Proof. The relations (f3"9")l , (14"0)) and (|4ip , are sufficient for P > because they imply 
the conditions (f3T)]) and (f3"3")l . Now put 

L(y)= E p ^ x )- 

x<N 

From p7p we find that {L(y) : y < k} attains its maximum at y = k and by using 
([3"9"| this maximum becomes L(k) = 5 + f^^Y So, this last quantity must be at 

most 1 in order that P is substochastic. On the other hand, from ([38]) it follows 
that {L(y) : y > k} attains its maximum at y — N and that this maximum is 

L(N) ( S \ + ( V + m-S) \ ( aS \ ( a(l-Sy 
L{N) -{l + a) + { 1 + a ) + {(l + a)(l + (3)) + { 1 + a 

By straightforward computations it follows that 

L{N) = -±—{l + a + (3{l-L{k))) . 
1 + a 

Then, by using L(k) < 1 we deduce that L(N) < 1 if and only if L(k) = 1, in which 
case L(N) = 1. The result is shown. □ □ 

If the ultrametric dual is seen as a perturbation of the Siegmund dual then there 
is a rigidity result for the BD chains. 
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Proposition 12. Let P be the stochastic kernel of an irreducible BD chain on I = 
{0, • • • , N}. Assume that there exists a substochastic kernel P that is a H a ^—dual 
ofP, H a . (i P' = PH a #. 



Then we necessarily have j3 = and the monotone property \2J$ is verified. More- 
over, if k > 1 then a = f3 = and H a j3 = Hq q = Hs is the Siegmund dual. 

If k = then a < (1 — Po)/<li- If oc = (1 — po)/qi the kernel P is stochastic, and 
when a < (1 ~Pa)/qi the kernel P is substochastic and it only looses mass trough 
{0}. 



Proof. From (|32| we have 

'1+0 



P(A:+2,fc-l)=( T ^-j^(P(fc-l,z)-P(fc,z))+(i±£j £(P(*-l,z)-P(M) 



z<k v ' k<z<k+2 

1 ^ (l-(l-P( k ,k+l)))-(±±£)p(k,k + l). 



1 + a J \l + a / 

So P(k + 2, fe — 1) = —P(k, k + l)(/3/l + a), and we must necessary have (3 = 0. 



Since /3 = 0, from relations (|30|) and (|33[) . it results that the conditions to have 
P > is that (gll) is fulfilled, that is p x + q x+1 < 1 Vx € {0, • ■ • , iV - 1}. 



On the other hand if k > 1 we get from (|37|) that for y = k, 

P(k, x)=Y, p (°> *) + (ttr) E p ( fc + !» z ) = 1 + (its) p ( fc + fc ) ■ 

So, we must necessary have a = 0. 

In the case fc = from relation (|55|) it results that P(x, y) = 1 for all y > 0. 

ye/ 

The only case we must examine is (pTTf for fc = and the condition P(0, y) = 

yei 

(1 — p — 0) + a<7i < 1 implies a < (1 — po)/qi- □ 



5. Strong Stationary Times 

Let P be an irreducible positive recurrent stochastic kernel on the countable set / 
and X — (X n : n > 0) be a Markov chains with kernel P. Let 7r be the stationary 
probability measure of X. We denote by ttq the initial distribution of X and in 
general ir n is the distribution of X n , 7T„(-) = P 7ro (X„ = ■). It verifies ir' n = Tr' P n . 

A random time T is called a strong stationary time for X, if Xt has distribution 
7r and it is independent of T, see pQ . The separation discrepancy is defined by, 



sep (n n , 7r) := sup 
ye/ 



t(2/) 
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It satisfies sep(7r„,7r) > \\ir n - 7r|| TV where \\-K n - ir\\ TV = 2 E \^n{v) ~ ^{v)\ is 

y£l 

the total variation distance between Tr n and tt, see |T] and [4]. In Proposition 2.10 
in pQ it was proven that every stationary time T verifies 

(42) sep (tt„ , tt) < P ff0 (T > n) n > . 

Based upon this result the strong stationary time T is called sharp when there is 
equality in (|42|) . that is 



sep (7r„, 7r) = F %0 (T > n) n>0. 

In Proposition 3.2 in pQ it was shown that a sharp strong stationary time always 
exists. 



Let P be a stochastic kernel on the countable set / such that P is a A— intertwining 
of P where A is a nonsingular stochastic kernel, so PA = AP. Let X = (X n : n > 0) 
be a Markov chain with kernel P. 

Recall that when we are in the framework of Theorem [2J we have PA = A P , so 
P is a A— intertwining of the reversal kernel P . Hence, when the intertwining is 
constructed from a dual relation, P and the reversed chain X will play the role of 
P and X in the intertwining relation. In the reversible case P = P both notations 
coincide, that is P = P and we can take X = X . this occurs for instance when P 
is the kernel of an irreducible BD chain. 

The initial probability distributions of the chains X and X will be respectively 7r 
and 7r j that is Xq ~ tt and X ~ ttq. We assume that the initial distributions are 
linked, this means: 

(43) n' = n' A. 

When this relation is verified we say that n' and ttq is an admissible condition. 
Let TT n and 7? n be the distributions of X n and X n . By the intertwining relation 
P"A = AP™ for all n > 1, and the initial condition (14"3")) we get 

<=^A Vn>0. 

5.1. The coupling. Let us recall the coupling done in [4j between the intertwining 
Markov chains. Consider the kernel P defined on I x I by: 

P ((x, x), (y, y)) = (mj\,~ ^ 1 ( AP (2, V) > 0) . 

The kernel P is stochastic. Let X = (X n : n > 0) be the chain taking values 
in /x/, evolving with the kernel P and having as initial distribution the vector 
(7ro,7fo) where tt' = tt A. It can be checked that A" is a coupling of the chains X 
and X. Then, in the sequel we will write by X and X the components of X, so 
X n = (X n , X n ) for all n > 0. In the above construction it can be also checked that, 

(44) A (x, x) = P (X n =x\X n = x S \ Vn > . 
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(For this equality also see [2]). In [4j this coupling was characterized as the unique 
one that verifies (|44|) and three other properties on conditional independence. These 
properties imply that the coupling also satisfies, 

A(5: n , x n ) = P \X n = x n I X = x ■ ■ -X n = x n ^j Vn > . 

In this process the original ergodic Markov chain X governed by P, may be viewed 
as a random output of the Markov process X governed by P = APA^ 1 , when A 
is non singular. This is a setup reminiscent of filtering theory with X the hidden 
process and X the observable. The peculiarity of the intertwining construction is 
that the output X process is itself Markov. 

The following concept was introduced in [?]. 

Definition 3. The Markov chain X will be called a strong stationary dual of the 
Markov chain X, if X has an absorbing state d that verifies 

(45) <k{x) =F(x n = x\X =x --- AVi = x n -i,X n =d\ Vr e I, n > , 
and where Xq • • • £„_i S / satisfy P (^X — xq ■ ■ ■ X n _i = a? n _i, X n — d^j > 0. 

In Theorem 2.4 in pQ it was shown that when the condition ([45]) holds then the 
absorption time Tg at {d} is a strong stationary time for X . Moreover in Remark 
2.8 in [4] it is built a specific dual process X having an absorbing state d and whose 
absorption time Tg is sharp. 

Assume that d is an absorbing state for X. From ([4]) we get 7r' = e~A. When the 
initial conditions are linked by relation (|43|) n' — tt' q A, we get that Tg is a strong 
stationary time for X. Indeed, from A(<9, x) = P (x n — x \ Xq — xq--- X n = d^j , it 

follows that n(x) = P ^X n = x \ Xq = xq • • • X n = dj is verified because condition 
7r' = e'~A holds. Observe that for the Siegmund dual and monotone kernels (that 
is verifying (fl4|) ) the absorbing state is d = N . 

5.2. Choice of the initial conditions. Let d be an absorbing state of X. From 
P5| the initial conditions of the chains must verify ir' — tt q A to be able to per- 
form the duality construction and to get that the absorption time Tg is a strong 
stationary time for X. 

Assume that 1 = 1. Since A is a stochastic matrix it has a left probability eigen- 
vector n' A satisfying n' A = tt' a A. So, we can choose X ~ X ~ tt\ because P5)l 

is satisfied (we also use ~ to mean 'distributed as'). Then, when X is initially 
distributed as 7ta, Tg is a strong stationary time for the chain X starting from n\. 

If A is non irreducible then tt\ could fail to be strictly positive. This is the case 
for the Siegmund kernel. In fact, from (f2"T) it can be checked that e is the unique 
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left eigenvector satisfying = eg A and so 7ta = eo. Then, for the A- intertwining 
given by (f21) the initial condition X ~ S and X ~ 5q is admissible. 

When b £ I, b £ I, b ^ d, verify e~A = then Xq ~ £g and ~ <5f, is an 

admissible initial condition (it verifies (|43|) ). Then starting from 6 is a strong 
stationary time for X starting from b, and it is strictly positive. In this case, both 
Xq and Xq start at a single point. We observe that the condition e~A = is 
equivalent to the following condition on the dual function: He^ = ce& for some 
c > 0. Indeed if H verifies this condition and since A = D^H'D^ (see ((5$) we 
obtain e~A = c'e' b with d = cir(b)/(p(b). Since A is stochastic we get d = 1, and so 

c = (p(b)/ir(b). This gives He^ = %r$j&b, which is exactly e~A = e' b . 

For the Siegmund kernel and P monotone, A is given by (12"T]) and the equation (PS)) 
takes the form 

N ,s 

no(x) = y^K (z)— — Vx £ I . 

' TT c (z) 

So we need that 7To(a:)/7r(x) decreases with x € I and in this case ttq(x) — 
tt c (x) {ttq{x)/tt{x) — ttq(x + l)/ir(x + 1)). These are, respectively, condition (4.7) 
and formula (4.10) in 0]. 

We recall that every monotone kernel P verifies condition tt' = e' N A (sec (|3J). The 
A— intertwining P is the one of P, and in this case X and X denote the Markov 
chains associated to P and P, respectively. 

5.3. Conditions for sharpness. We now give a proof of the sharpness result 
alluded to in Remark 2.39 of [1] and in Theorem 2.1 in [7]. 

Proposition 13. Let X be an irreducible positive recurrent Markov chain, X be 
a A— intertwining of X having d as an absorbing state. Assume that there exists 
d G I such that 

(46) Ae d = 7r(d) eg. 

Then X is a sharp dual to X , that is for Xq <~ ttq and Xq ~ tt with tt' q — tt'qA, we 
have: 

(47) sep{ir n , tt) = (Tg > n) V n > . 

Proof. From condition Ae^ = Tr(d)e7j we get, 

(48) TT n (d) = Tr' n e d = Tr' n Ae d = 7r(d)TT n (d). 
Since tt > 0, the last equalities imply that 

(49) n n (d) > «■ n n (d) > 0. 
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On the other hand the condition n' = e'~A means that the d— row of A verifies 

A(d, ■) = 7r'(-) > 0. Then, if for some n we have 7r„(<9) > 0, from Tr' n = Tr' n A we 
deduce 7r„ > 0. Moreover, 

^n(x) = n(x)A(x, x) > n(d)A(d, x) = n(d)Tr(x) 
Therefore, from (|48| we get 

. 7r„(a;) TT n {d) 
mm — — = 7r(S) = ——- . 
xei ir(x) 

Then, sep(7r„,7r) = 1 — n(d). Since d is an absorption state implies Tr n (d) = 
P^ (Tg < n) , we get the desired relation 

sep(7r n , ir) =Pj (Tg > n) Vn > n + , with rt + =inf{n > : TT n {d) > 0} . 

Let us show that the relation (H7)) holds for n < n+. First remark that in this 
case 7r„(<9) = 0, which by implies 7r„(d) = 0. Then sep(7r„,7r) = 1 and so the 
equality sep(7r„,7r) = V^ g (Tg > n) = 1 holds. We have proven that A is a sharp 
dual to X. □ 

Proposition 14. (i) Assume the hypotheses of Theorem^ are verified and that P 
is a substochastic kernel having a as an absorbing state in P. Then, if there exists 
some d £ I such that 

(50) e' d H = ce'~ for some c > , 

then a is an absorbing state for X and X is a sharp dual to X . That is, when 
7Tq = 7TqA the relation |^7[ ) is verified. 

(ii) Assume the hypotheses of Theorem^ are verified and that P is a substochastic 
kernel verifying that there exist a £ I , d £ I such that for some constants c > 0, 
c > we have 

(51) He^ = c'l and e' d H = ce~ 
Then part (i) holds, and X is a sharp dual to X . 

Proof, (i) From Theorem [5] (v) it follows that a is an absorbing state for P. From 
Proposition [T51 it suffices to show that d verifies (pro]) : Ae^ = 7r(d)eg. Since the 
hypothesis is H (d, y) = c6 y ^ for some c > and for all y £ I, the Remark [3] implies 
A(x,d) — c"6 Xt a for some c" > 0. Now, from Theorem [2] (v) and ^ we have 
ir(d) = A(a,d), and we deduce c" — n(d). Therefore A(x, d) = , n{d)8 x ^ L which is 
equivalent to (|46|) . 



(ii) From Proposition [3] we get that the first relation in (|5ip guarantees that a is an 
absorbing state for P. So, we are under the hypotheses of part (i) and the result 
follows. □ 

Corollary 15. (i) For a monotone irreducible stochastic kernel P, the A— intertwining 
Markov chain X has N as absorbing state and it is a sharp dual of X . Moreover, 
both chains X and X can start at state 0. 
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(ii) For a monotone irreducible stochastic BD kernel P we have that the BD chain 
X is a sharp dual to X. 

Proof. For part (i), the properties required for sharpness for the Siegmund inter- 
twining of BD chains follow straightforward because the N— th row of H$ verifies 
([50]) with d = N. Also the relation ([22]) in Corollary [6] is exactly (j46]). The fact 
that the state is admissible for both X and X is a consequence of e' = e' A. In 
part (ii) the only novelty is that for BD chains P = P . □ 

We note that, by definition, for an absorbing point a there is a unique state d 
verifying (|50p . as it occurs for the Siegmund kernel. 

When d verifies the property (HH) we say that d is a witness state in X that X 
hits d. It reflects the following more general situation. Assume that A fulfills 
A(a;, y) > x > y. Then, from ir' — tt A we get 

n (x) > n (y) > Vy < x. 

Then if TV is an absorbing state of P and P(y, y + 1) > and P(y, y + 1) > for 
all y € {0, • • • , N — 1} the equivalence ir n (N) > <^=> n n (N) > is satisfied, and so 
N will be a witness state in X that X hits the state N. 

5.4. Times to absorption. From Proposition [14] for the BD chains the random 
time Tjv starting from the state gives information on the speed of convergence 
to its invariant measure, of the original BD chain X starting from the state 0. In 
the sequel we denote by Tjv : o a random variable distributed as the hitting time T/v 
when starting from 0, that is P(Tjv ;0 = n) = Vo(T N — n) for n > 1. We denote its 
variance by Var(TV ; o)- 

For BD chains absorbed at N, the probability generating function of Tn starting 
from is, see [T3] and [7], 

(52) E («^)=Jj(lz**)« jtie[ o j l] 1 

fc=i kU 

where —1 < tk < +1, k = 1, ...,N are the N distinct eigenvalues of both P and P, 
avoiding to = 1. The formula (|5^|) also reads 



(r JV;0 >n)=^n 



N 

t?, n> N — 1. 



1 _ h + „ 

i ' 



i=i *' _ ifc 

Then, tj" n P(T/v ; o > n) — » 11^=2 ti-tl as * — y °° ! an< ^ -^ v ;° ' ias g eome t r ic tails with 
exponent ti. Also, 

AT JV JV 

E(7V ; o) = $^(1 - ^y 1 and Var(TAr ;0 ) = ^(1 - t k y 2 - ^(1 - t^ 1 . 

k=l fe=l fc=l 

Since £i is the dominant eigenvalue 

(53) Var (f w . ) < E(TV ;0 )/1 - t x . 
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When the eigenvalues tk are non negative, then Tjv ;0 ~ X^fcLi T k where the r^s are 

independent and Tf. ~ Geometric (1 — tk), the geometric distribution with success 
parameter 1 — t& on {1, 2, • • • }. Assume that the eigenvalues tk are not all positive 
and put tjq < ... < ti + i < < t; < ... < t\ < to = 1. Then ([52]) interprets as: 

zv z 

Tn-,0— 22 ~ ^ T fc , 

fe=Z+l fc=l 

where ~ Bernoulli(l/ (1 — i^)) , Tk ~ Geometric (1— t^) and Tat ; o are all mutually 
independent. All the previous results in this Subsection 15.41 can be found in [I], [2] 
and [7]. 

When tk are known explicitly it is possible to compute E(Tjv ; o) and Var(T/v ; o). So, 
we can search for conditions under which 

E(fjv ;0 ) -> oo and Var(TV;o)/(E(TV;o) 2 ) — > as A — > oo. 

If this is the case, Tjy ; o/E(Tjy ; o) — > 1 as A ^ oo in probability, and E(T/v ; o)J is a 
cutoff time for A started at 0. In this goal, from (|54[) we get Var fTjv ; o/E(Tjv ; o)J < 
1/ f(l - h)E(T Ni0 j) ■ Then, (1 - ti)E(T N;0 ) — > oo as A oo is a sufficient condi- 



tion for Varf Tjv;o/1E(2V;o)J —> 0. See 5 for recent developments and precisions. 

Example: Consider the Moran model with mutations, and put a := a\ + a-2, a := 
1 — a. From ([29]) the eigenvalues tk verify: 1 — tk = & (a + a^-). Using the 
approximation 

K(T ) A f 1 ^ X ^ f f 1 dx _ f 1 dx 

( N;0 >~ Jo (x + 1/N) (a + ax) ~ Na-a \J x + l/N~ a J a + ax 

we get 

E(fjv ; o) ~ A (log A + log a) /a and Var(Tjv ; o) ~ (A/a) 2 
showing that Vax^T N;0 /E(f N; ofj ~ (log A)" 2 -> 0. The expected mixing time is 
lEpJV-o) ~ A log A/a whereas the spectral gap is 1 — 1\ = a/A. □ 

In general, the values tk are not known. So it would be helpful to compute differently 
the mean and the variance of the absorption time T/v ; o- This is the goal of our next 
paragraph in the BD chain context. 

The mean and the variance of the absorption time. Let us compute E(T/v ; o) and 
Var(T/v ; o) by the usual methods. We introduce the following sequences of indepen- 
dent random variables: 

(S y : y = 0, • • • , N - 1) with distribution P(S y = n) = F y (f y+ i = n) Vn > , 

so S y is a copy of the time spent in hitting y + 1 when X starts from y. We also 
assume that the sequence (S y : y = 0, • • • , A — 1) is independent of the Markov 
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chain X. Observe that 

JV-l 

f(Y^ S y =n) = P(T N , Q = n) Vn > . 
y=o 

When the initial condition is Xq = y, we have the representation 

(54) S y ~ 1(X = y + 1) + 1(X = yl)(l + S£) + 1(X = y - 1)(1 + Vi + . 

where S' y and are independent copies of Sy, which are independent from X and 
from the whole sequence (S y : y = 0, • • • , N — 1). By taking expected values we 
find the recurrence relation 

(55) E(S v ) = i- + ^E(S„_i)- 
Since E(S*o) = 1/po we get by iteration, 

(56) E(S y ) =fi[[l 

and so the mean of the absorption time at N starting from is, 

N-i ( v 1 y ~ \ 

w=E Ef n f\- 

y=0 \l=0 ^ r=l + l Pr J 

Also from ([54|l we obtain 

S 2 y l l(X = y + l) + l(X = y)(l + 2S' y +S'l) 



+1(X = y - 1)(1 + + S"* + 2S r „_i + 25;' + 2S y _i2S;') . 



Therefore 



E(S 2 ) = ± + ^yE(S y ) + ^E(S 2 y _ 1 ) + ^ ^(S y - 1 )+E(S y )+E(S y - 1 )E(S y )) . 
Py Py Py Py 

From (J55J) we find that Var(S y ) = E(S 2 ) - E(S y ) 2 verifies 

Var(S a ) = ^ + ^E(S y ) + ^E(S%_J + ^(E(Sy- 1 )+E(Sy)+E(S y -i)E(Sy)) 
Py Py Py Py 

~4-f E (^i) 2 -^ E (Vi)- 

Py Py Py 

Therefore 

(57) Var(5 a ) = ^Var(Vi) + 4/ - 



where 



Ay = 5>i + 2 SLM E (Sy) + 2Qy{P JL ^ EiSy^) + 2 SE(Sy^)E(Sy) 

Py Py Py Py 

- M e(V!) 2 . 

PS 
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Observe that from ()56|) the coefficient A y can be computed in terms of the param- 
eters of the BD chain X. In particular Ao = Var(S'o) = (1 — po)/Po- From the 
recurrence formula (|57[) and the value for Var(S*o), we find the explicit expression, 

Var(S,)=f> f[ |. 

1=0 s=l+l Ps 

Therefore, by using independence, the variance of the hitting time of N starting 
from is, 

N-l N-l / y y ~ \ 

(58) Var(T Ar;0 ) = ]T Var(^) - J2 E A < II | > 

which can be explicitly computed simply in terms of the transition parameters of 
the BD chain X. 

Remark 6. Even if the expressions of the mean and the variance in &56}) and A58\) 
do not require the knowledge of the spectrum, they are difficult to handle in terms 
of the parameters, so in general we are not able to use them to describe the behavior 
of the mean and the variance when N is large. 

6. The Hypergeometric dual 

For I = {0, • • • , N} let us suggest other potentially interesting examples of non- 
singular duality kernels H for which there exists a column of H which is constant 
so that Proposition [3] can be applied. For these examples, H _1 is known explicitly 
which turns out to be useful to decide whether for a given irreducible stochastic 
kernel the H— dual defines a substochastic matrix. If this occurs, the problem 
of interpreting the intertwining chain given by Theorem [21 remains a challenging 
problem for each specific case. 

The Vandermonde dual and the hypergeometric duals that were first introduced 
in [18] in the context of neutral population genetics. In this context and also in 
nonneutral situations, the hypergeometric kernel plays a central role. 

• Vandermonde. H(x,y) = (x/N) v . In this case the column 0— th is constant. 

• Hypergeometric. H(x,y) = ( Ar ^ a: )/(^)- In this case H = H' , H is upper-left 
triangular, and (|51[) is verified with a = and d = N , 

(59) He = 1 and e' N H = e' . 
Let us comment on this choice of H . 

When P is given by the reversible Moran model with completely monotone non- 
neutrality bias mechanism, the H — dual kernel P can be interpreted in terms of a 
multi-sex backward process akin to the coalescent. As shown in [TT], for the Moran 
model with bias p satisfying p(0) <E (0,1) we have: -Pl(O) — 1 and < Pl(x) — 
1 - jjp(0) < 1 for all x ^ 0. From p(0) ^ 0, all the states but a = of P are 
mass-defective. The intertwining matrix P is the transition kernel of a skip-free 
to the left BD chain that can easily be obtained from [11], and is the unique 
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absorbing state for X. The relation (I59|) fulfills the hypotheses of Proposition [ill 
with d = N, then in the above Moran model the sharpness property is satisfied. 

On the other hand the link matrix A is upper-left triangular, stochastic and ir- 
reducible. Then, there exists a probability vector 7ta that verifies Tr'^k = tt' a , so 
tto = tto = tta is an admissible initial condition for X and X. Also, from e^A = 
we get that another admissible initial condition is ttq = 5q and ttq — 6n. We can 
summarize this discussion in the following result. 

Corollary 16. Let X be the Moran chain with transition matrix P fulfilling the 
above monotonicity conditions on p. Then, the construction of the intertwining 
kernel P in Theorem [H starting from the hypergeometric dual H can be done and 
the Markov chain X is well-defined. The absorbing state of X is 0, the process X 
is a sharp dual of X and X can be started at N while X starts at 0. 

Hence, the time To-n that X reaches when it starts from N, is the stochastically 

smallest time at which Xf ~ tt given Xq = and Xq = N. We point out that the 

time To ; jv that X reaches when it starts from N, is distributed like the time Tjv ; o 
to reach N starting from of the Siegmund intertwining BD chain to the Moran 
model, namely like l|52p. This is in accordance with Theorem 1.2 of [5], stating 
that for a skip-free to the right Markov chain absorbed at N, the law of the time 
it takes to hit N starting from is given by ([52]) . This result can be transferred to 
our skip- free to the left BD chain case, while exchanging the boundaries {0, N}. 

Let us finally consider the Wright-Fisher transition matrix P given by 



P(x,y) = 




whose bias p(u) is again a completely monotone function, satisfying p(0) > 0. This 
process is not reversible, nor is it in the BD class. However, using the hypergeo- 
metric duality kernel it was shown in [10] that the H— dual P to P in {2} defines 
a substochastic matrix. From Theorem [5] we conclude that the corresponding P is 
A-linked to P. □. 
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